This report explores a datset containing an expert quality assessment and chemical composition data (11 variables) for 4,898 white wines (all Portuguese ‘Vinho Verde’).
## [1] "whitewine dataframe"
## 'data.frame': 4898 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1 Min. : 3.800 Min. :0.0800 Min. :0.0000
## 1st Qu.:1225 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700
## Median :2450 Median : 6.800 Median :0.2600 Median :0.3200
## Mean :2450 Mean : 6.855 Mean :0.2782 Mean :0.3342
## 3rd Qu.:3674 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900
## Max. :4898 Max. :14.200 Max. :1.1000 Max. :1.6600
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.600 Min. :0.00900 Min. : 2.00
## 1st Qu.: 1.700 1st Qu.:0.03600 1st Qu.: 23.00
## Median : 5.200 Median :0.04300 Median : 34.00
## Mean : 6.391 Mean :0.04577 Mean : 35.31
## 3rd Qu.: 9.900 3rd Qu.:0.05000 3rd Qu.: 46.00
## Max. :65.800 Max. :0.34600 Max. :289.00
## total.sulfur.dioxide density pH sulphates
## Min. : 9.0 Min. :0.9871 Min. :2.720 Min. :0.2200
## 1st Qu.:108.0 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100
## Median :134.0 Median :0.9937 Median :3.180 Median :0.4700
## Mean :138.4 Mean :0.9940 Mean :3.188 Mean :0.4898
## 3rd Qu.:167.0 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500
## Max. :440.0 Max. :1.0390 Max. :3.820 Max. :1.0800
## alcohol quality
## Min. : 8.00 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.40 Median :6.000
## Mean :10.51 Mean :5.878
## 3rd Qu.:11.40 3rd Qu.:6.000
## Max. :14.20 Max. :9.000
Preliminary assessment of the data suggests that the quality score may often be usefully treated as a factor variable, as it only takes integer scores (and only scores from 3-9 are seen - so it is effectively a 6 point scale fo this dataset). An additional variable (quality.factor) is introduced for when a factor scale is more useful.
A quick view shows that quality scores range from 3-8, with the majority between 5-7. Scores show a relatively normal distribution. It is also worth noting that there are very few wines with very high or low quality scores of 3 (20 wines) or 9 (5 wines).
Residual sugar, when expanded to a log_10 scale, shows a bimodal distribution, with peaks at c. 1.3 and 10 (red lines). White wines are commonly regarded as either ‘dry’ or ‘sweet’ - so perhaps it is worth splitting the whitewines into these two categories for the analysis, split at around residual.sugar = 3 (the orange line)?
# creates new sweetnes variable based on residual.sugar content
whitewine$sweetness <- NA
whitewine$sweetness <- factor(ifelse(
whitewine$residual.sugar >=3, 'sweet','dry'))
# also adds new dataframes of only sweet or dry white wines
whitewine.sweet <- whitewine[whitewine$sweetness == 'sweet',]
whitewine.dry <- whitewine[whitewine$sweetnes == 'dry',]
I have created a new variable ‘sweetness’, with values ‘dry’ and ‘sweet’.
NOTE Technically this is probably a bivariate plot - I only include it here to clarify the split I have made in the data at this point, as it features in future analysis.
## [1] "whitewine.sweet dataframe"
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1 Min. : 3.800 Min. :0.0900 Min. :0.0000
## 1st Qu.:1300 1st Qu.: 6.400 1st Qu.:0.2200 1st Qu.:0.2600
## Median :2506 Median : 6.800 Median :0.2700 Median :0.3100
## Mean :2492 Mean : 6.874 Mean :0.2871 Mean :0.3364
## 3rd Qu.:3716 3rd Qu.: 7.300 3rd Qu.:0.3300 3rd Qu.:0.3900
## Max. :4895 Max. :11.800 Max. :1.1000 Max. :1.2300
##
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 3.000 Min. :0.01400 Min. : 2.00
## 1st Qu.: 6.000 1st Qu.:0.03700 1st Qu.: 27.00
## Median : 8.400 Median :0.04400 Median : 37.00
## Mean : 9.335 Mean :0.04674 Mean : 38.68
## 3rd Qu.:12.400 3rd Qu.:0.05100 3rd Qu.: 50.00
## Max. :65.800 Max. :0.34600 Max. :131.00
##
## total.sulfur.dioxide density pH sulphates
## Min. : 18.0 Min. :0.9887 Min. :2.720 Min. :0.2200
## 1st Qu.:118.0 1st Qu.:0.9936 1st Qu.:3.080 1st Qu.:0.4100
## Median :149.0 Median :0.9955 Median :3.160 Median :0.4700
## Mean :149.6 Mean :0.9954 Mean :3.171 Mean :0.4838
## 3rd Qu.:179.0 3rd Qu.:0.9975 3rd Qu.:3.250 3rd Qu.:0.5400
## Max. :366.5 Max. :1.0390 Max. :3.820 Max. :1.0800
##
## alcohol quality quality.factor sweetness
## Min. : 8.00 Min. :3.000 3: 12 dry : 0
## 1st Qu.: 9.30 1st Qu.:5.000 4: 78 sweet:3030
## Median : 9.90 Median :6.000 5: 980
## Mean :10.23 Mean :5.844 6:1371
## 3rd Qu.:11.00 3rd Qu.:6.000 7: 480
## Max. :14.05 Max. :9.000 8: 107
## 9: 2
## [1] "whitewine.dry dataframe"
## X fixed.acidity volatile.acidity citric.acid
## Min. : 2 Min. : 4.200 Min. :0.0800 Min. :0.0000
## 1st Qu.:1145 1st Qu.: 6.200 1st Qu.:0.1900 1st Qu.:0.2700
## Median :2351 Median : 6.700 Median :0.2500 Median :0.3200
## Mean :2381 Mean : 6.823 Mean :0.2638 Mean :0.3307
## 3rd Qu.:3593 3rd Qu.: 7.300 3rd Qu.:0.3100 3rd Qu.:0.3800
## Max. :4898 Max. :14.200 Max. :1.0050 Max. :1.6600
##
## residual.sugar chlorides free.sulfur.dioxide
## Min. :0.600 Min. :0.00900 Min. : 3.00
## 1st Qu.:1.200 1st Qu.:0.03400 1st Qu.: 19.00
## Median :1.500 Median :0.04000 Median : 28.00
## Mean :1.617 Mean :0.04421 Mean : 29.84
## 3rd Qu.:1.900 3rd Qu.:0.04800 3rd Qu.: 38.00
## Max. :2.900 Max. :0.27100 Max. :289.00
##
## total.sulfur.dioxide density pH sulphates
## Min. : 9.0 Min. :0.9871 Min. :2.740 Min. :0.2500
## 1st Qu.: 96.0 1st Qu.:0.9906 1st Qu.:3.100 1st Qu.:0.4100
## Median :117.0 Median :0.9918 Median :3.210 Median :0.4800
## Mean :120.1 Mean :0.9918 Mean :3.216 Mean :0.4997
## 3rd Qu.:142.2 3rd Qu.:0.9930 3rd Qu.:3.320 3rd Qu.:0.5600
## Max. :440.0 Max. :0.9980 Max. :3.810 Max. :1.0600
##
## alcohol quality quality.factor sweetness
## Min. : 8.00 Min. :3.000 3: 8 dry :1868
## 1st Qu.:10.10 1st Qu.:5.000 4: 85 sweet: 0
## Median :10.90 Median :6.000 5:477
## Mean :10.97 Mean :5.933 6:827
## 3rd Qu.:11.80 3rd Qu.:7.000 7:400
## Max. :14.20 Max. :9.000 8: 68
## 9: 3
Let’s look at quality again.
Quality distribution of sweet and dry whitewines appears similar.
Fixed acidity shows a relatively thin normal distribution about a median of 6.8 (black line), with no noticeable sweet / dry difference. Black line is median fixed.acidity of all wines.
Volatile acidity shows a slightly right skewed normal distribution about a median of 0.26 (black line), with a few outliers at >0.8. Black line is median volatile.acidity of all wines.
Citric acid shows a generally normal distribution, with an odd peak at c. 0.49, and a smaller one at c. 0.74 (red lines). Black line is median citric.acid of all wines.
A ‘zoom’ into the histogram in this region shows that there certainly appears to be a local ‘spike’ in the data at citric.acid = 0.49. I wonder if there is some form of target / guideline to aim for ‘below 0.5’ for citric.acid during the wine making process? (Could this be Goodharts Law in action?). Or alternatively, could this be some measurement artefact?
Zooming in again, There is a similar (though smaller) local ‘spike’ at 0.74 - again, is there some effect clustering values below a ‘round’ value of 0.75?
There appears to be some effect causing a local spike in citric.acid at 0.49 and 0.74, just below the ‘round’ values of 0.5 and 0.75. My suspicion at this point is that 0.5 and 0.75 could be some form of ‘target levels’ which winemakers aim to be below.
Chlorides show a tight normal distribution about a median of 0.043, with a few outliers above 0.1. Here the dry wines show slightly lower levels of chlorides than sweet wines. Black line is median chlorides of all wines.
Free sulfur dioxide shows a (very) slightly right skewed distribution about a median of 34.0. There is a noticeable difference here between sweet and dry, with dry wines showing lower free.sulfur.dioxide. Black line is median free.sulfur.dioxide of all wines.
Total sulfur dioxide shows a more symmetrical normal distribution about a median of 134.0. Again, there is a noticeable difference here between sweet and dry, with dry wines showing lower total.sulfur.dioxide. Black line is median total.sulfur.dioxide of all wines.
Density shows a normal distribution in a very tight range (mostly 0.99 < density < 1.00). A marked difference between sweet and dry is clear: dry wines are lower density. Black line is median density of all wines.
pH shows a ‘very’ neat normal distribution around a median (all wines) of 3.18 (black line).
Sulphates show a slightly right skewed distribution around a median (all wines) of 0.47.
Alcohol shows a more ‘spread’ distribution, with median of 10.4. And a clear difference between sweet and dry wines. Let’s look at them separately.
Dry wines now a more symmetrical normal distribution , while sweet wines show a noticeable ‘right skew’. Black line is median alcohol of all wines.
The data contains data on 11 chemical parameters and a expert quality assessment (on a scale of 1-10) for 4,898 white wines. All of the chemical parameters are measurements on a continuous scale. The quality score is integers (only) from 1-10, and only scores from 3-9 are observed, so for some aspects of this analysis it makes sense to consider quality as an ordered factor varaible.
Most wines have a quality score of 5-7, with few scoring either very high (quality.factor = 9) or very low (quality.factor = 3) quality scores.
Most of the chemical parameters show normal or slightly skewed distributions, with a few exceptions worth noting:
Based on the bimodal distribution of residual.sugar, and knowledge that white wines are conventionally classified as ‘sweet’ or ‘dry’ based on sugar / sweeteness, it makes sense to classify the population into sweet and dry wines (at residual.sugar >3 or <3), and see how the analyses & correlations vary between these two sets. The sulfur.dioxide variables (free. and total.) both show differences between sweet and dry wines, as do density and alcohol.
I certainly think it will be useful to consider the difference between the sweet and dry wines for future analysis. At this it is hard to know what variables or correlations will prove most useful.
It appears worthwhile to add a ‘quality.factor’ variable - for views where quality is more usefully considered as a factor in subsequent plots.
I also added a ‘sweetness’ variable, to split the data up into ‘sweet’ and ‘dry’ wines (at residual.sugar >3 or <3).
I used a log_10 scale for the residual sugar to highlight the bimodality of the distribution, and enable the splitting between sweet and dry.
Let’s start by looking at the correlation matrix for the whole dataset.
An initial look at a correlation matrix indicates a few areas of interest:
I thought it might also be interesting to see where the correlation matrices show the greatest differences when evaluated separately for the ‘sweet’ and ‘dry’ white wines:
# calculating the 'difference' between the correlation matrices for
# sweet and dry wines
m.sweet <- cor(whitewine.sweet[c(13, 2:12)])
m.dry <- cor(whitewine.dry[c(13, 2:12)])
m.diff <- m.sweet - m.dry
m.diff
## quality fixed.acidity volatile.acidity
## quality 0.000000000 0.103847402 0.060402068
## fixed.acidity 0.103847402 0.000000000 0.086271067
## volatile.acidity 0.060402068 0.086271067 0.000000000
## citric.acid -0.097525268 0.003264288 0.192705704
## residual.sugar -0.319537384 0.153287995 -0.140546252
## chlorides -0.052932297 -0.019885777 0.023636359
## free.sulfur.dioxide -0.219418791 0.089155598 -0.008518115
## total.sulfur.dioxide -0.139796651 0.023420972 -0.005980721
## density 0.079336798 -0.140449627 -0.051435814
## pH -0.108615676 0.078145055 0.038409079
## sulphates -0.185374791 0.039546297 0.207529360
## alcohol 0.003887193 0.077513043 0.106199309
## citric.acid residual.sugar chlorides
## quality -0.097525268 -0.31953738 -0.05293230
## fixed.acidity 0.003264288 0.15328800 -0.01988578
## volatile.acidity 0.192705704 -0.14054625 0.02363636
## citric.acid 0.000000000 0.19792871 -0.29290577
## residual.sugar 0.197928706 0.00000000 0.13804589
## chlorides -0.292905769 0.13804589 0.00000000
## free.sulfur.dioxide 0.177508348 0.14656752 0.08945506
## total.sulfur.dioxide 0.155075239 0.15423363 0.08997537
## density 0.090428010 0.83819069 -0.07232324
## pH 0.024615905 -0.21846280 0.02422776
## sulphates 0.011457022 -0.09799763 0.04519596
## alcohol -0.097298545 -0.66917474 0.03628726
## free.sulfur.dioxide total.sulfur.dioxide density
## quality -0.219418791 -0.139796651 0.07933680
## fixed.acidity 0.089155598 0.023420972 -0.14044963
## volatile.acidity -0.008518115 -0.005980721 -0.05143581
## citric.acid 0.177508348 0.155075239 0.09042801
## residual.sugar 0.146567517 0.154233627 0.83819069
## chlorides 0.089455056 0.089975367 -0.07232324
## free.sulfur.dioxide 0.000000000 0.101141224 0.31209367
## total.sulfur.dioxide 0.101141224 0.000000000 0.13002179
## density 0.312093672 0.130021789 0.00000000
## pH -0.126954901 -0.088100208 -0.17722298
## sulphates -0.035318735 0.054952867 -0.01359132
## alcohol -0.300623358 -0.217728232 0.05682155
## pH sulphates alcohol
## quality -0.10861568 -0.18537479 0.003887193
## fixed.acidity 0.07814506 0.03954630 0.077513043
## volatile.acidity 0.03840908 0.20752936 0.106199309
## citric.acid 0.02461591 0.01145702 -0.097298545
## residual.sugar -0.21846280 -0.09799763 -0.669174742
## chlorides 0.02422776 0.04519596 0.036287255
## free.sulfur.dioxide -0.12695490 -0.03531873 -0.300623358
## total.sulfur.dioxide -0.08810021 0.05495287 -0.217728232
## density -0.17722298 -0.01359132 0.056821549
## pH 0.00000000 -0.07551512 0.095147122
## sulphates -0.07551512 0.00000000 -0.047923762
## alcohol 0.09514712 -0.04792376 0.000000000
If we look for the largest differences:
It is probably not surprising that the greatest differences are in correlations involving residual.sugar, as this is the dimension we have used to ‘split’ the dataset. But it will be interesting to view these different relationships across sweet vs. dry wines in the analysis below.
A review of the ‘pair plot’ matrix (split into 2, and using quality as a factor in each), highlights a few features worth investigating:
Let’s look at the boxplots vs. quality.factor in more detail.
NOTE It is important to remember in considering the plots below that only a small number of wines show quality scores of 9 (5 wines) or 3 (20 wines). So interpretation of the plots should focus principally on the quality.factor range 4-8.
Looking again (at a slightly greater scale) at boxplots of the various chemical parameters across quality scores. An initial look suggests that density and alcohol and chlorides show the clearest and most consistrent trends with quality. It is worth examining some of these parameters more closely, and see how the quality relationship might vary between sweet and dry whitewines.
Density shows a clear nagative trend with quality for both dry and sweet, but the variation across quality is more pronounced for the sweet wines. And there is a clear variation between optimal densities for sweet vs. dry white wines.
Alcohol shows a broadly positive correlation with quality for both sweet and dry wines, but the relationship appears more pronounced for sweet wines.
Chlorides do indeed show a (negative) correlation with quality, at slightly lower levels for dry wines than for sweet.
As the basis for the separation between dry & sweet, clearly residual sugar shows very different quality variation across the two groups. The trend for sweet whitewines is slightly negative: higher quality wines have less residual.sugar. However for dry whitewines, the trend is reversed - with higher quality wines showing slightly higher levels of residual.sugar.
Total.sulfur.dioxide shows a clear negative trend agains increasing quality for sweet wines, but a much less pronounced trend for dry wines.
The trend of sulphates with quality is slight, but does appear to be positive for dry wines and negative for sweet wines.
There is little consistent variation in pH with quality for sweet wines, but a very clear positive correlation for dry wines (higher pH means better wine).
Alcohol and density show a clear nagative correlation, and also a clear separation between sweet and dry wines (dry wines generally having lower density for a given alcohol level).
There is a clear positive correlation between residual.sugar and density for sweet wines, but no indication of any significant correlation for dry wines (albeit with a restricted range of residual.sugar).
Fixed.acidity shows a slight positive correlation with density, and also a reasonable separation between sweet and dry wines (dry wines are generally lower density for a given fixed.acidity).
The trends for residual.sugar vs. alcohol are less clear, but regression lines show different slope directions in sweet vs. dry wines.
There is some positive correlation between total.sulfur.dioxide and free.sulfur.dioxide, but no clear separation between dry and sweet wines.
It is now clear from this analysis that it does make sense to consider the data for sweet and dry white wines separately when looking at some of the relationships. The strength / level of effect of certain variables on quality can vary between the two groups, and may even be ina reverse direction.
Density and alcohol correlate well (negatively) with each other for both sweet and dry wines, but with a clear separation between the datasets (sweet wines are higher density at given alcohol).
Residual.sugar shows a clear positive correlation with density for sweet wines (sweeter wines are denser), but no clear relationship is seen for dry wines. This may be a by product of the restriced range of residual.sugar for dry wines (<3) based on the definition of ‘dry’ wines, but the lack of any correlation at all is striking.
Fixed.acidity also shows a positive correlatio with density for both sets of wines, again with good separation between the datasets (sweet wines are denser at a givem fixed.acidity)
There are only a few wines with quality scores <4 or >8, and they make insights harder to visualise on some plots, so for this section I will create a new set of ‘clipped’ data, with quality scores of 3 and 9 removed. (NOTE there are only 5 wines of quality 9 and 20 of quality 3 out of 4,898 wines in the data)
whitewine.sweet.clip <- whitewine.sweet[whitewine.sweet$quality.factor != 3
& whitewine.sweet$quality.factor != 9 ,]
whitewine.dry.clip <- whitewine.dry[whitewine.dry$quality.factor != 3
& whitewine.dry$quality.factor != 9 ,]
whitewine.clip <- whitewine[whitewine$quality.factor != 3
& whitewine$quality.factor != 9 ,]
Considering dry wines, it is clear that quality generally improves as alcohol increases and density decreases.
And a similar relationship can be seen in sweet wines.
Comparing the two plots, the pattern is the same across both - with the sweet wines showing a generally higher density.
Looking at other parameters, for dry wines it would appear that for a given density, a higher pH is likely to be a better wine, though the relationship is not strong.
A similar pattern is harder to find for sweet wines, buy there does appear to be indication that wines with higher residual.sugar need to also have higher alcohol levels to be a high quality.
The correlation between density and alcohol appears valuable in contributing to quality. For both sweet and dry wine datasets, quality tends to increase as alcohol rises and density decreases. Separating the date into sweet and dry datasets allows this relationship to be more clearly seen.
Other variables that reinforce the quality.factor are harder to discern, but it does appear that for dry wines, higher pH (at a given density) might be a useful indicator, and for sweet wines higher residual.sugar (at a given alcohol) might also be a useful indicator.
Nothing that I thought was particularly surprising.
I didn’t create any models, but it does seem likely that model accuracy would be higher if the sweet and dry wines were considered and modelled separately.
A histogram of residual.sugar, expanded to a log_10 scale, shows the clear bimodal distribution of residual sugar, and the division of the data into ‘sweet’ and ‘dry’ wines at residual.sugar = 3
A view of of density (and density distribution) varies with quality, for sweet and dry wines. Both sweet and dry wines show a clear trend of decreasing density with increasing quality, but at different levels of absolute density for sweet vs. dry wines. Dry wines are generally less dense and show a lower IQR than sweet wines of the same quality. And the variation in average density between ‘good’ (quality = 8), and bad (quality = 4) dry wines is smaller than the average density variation between the same quality levels for sweet wines.
For both dry and sweet white wines, there is a clear pattern that quality improves as alcohol increases and density falls. The distributions for sweet vs. dry are separate but overlapping (sweet wines are denser at a given alcohol), and so it is clearer to view them separately.
This has been an interesting exercise in ‘getting under the skin’ of an unfamiliar dataset, and trying to extract and discern trends and relationships within it.
I spent sometime looking at the various analyses before the insight came to me that it could be instructive to separate the data into ‘sweet’ and ‘dry’ wines, and view them separately. The bimodality of the residual.sugar distribution, and knowledge that white wines are commonly classified as ‘sweet’ or ‘dry’ made this feel like a reasonable step. And it does appear that winemakers are driving different aspects of the wines chemistry to make a ‘good’ sweet wine vs.a good ‘dry’ wine.
Once it was clear from the histrograms of total.sulful.dioxide and density that there clearly were other differences between the sweet and dry wines, I felt more confident in the decision to consider them separately.
From the Univriate analysis, a feature of interest was certainly the slightly anomalous ‘spikes’ in distribution of citric.acid at 0.49 and 0.74. The fact that these are just below some ‘round’ levels (of 0.5 and 0.75) still leaves me thinking there is some sort ot targetting citric.acid below these levels during the wine making process. But I suspect that bringing further insight to this anomaly would need further research and more data than is available in this dataset. And so is outside the scope of this work.
It was disappointing, perhaps, not to see any really strong and clear correlations (especially with quality). But I guess that’s what ‘real world data’ is like. Separating out to sweet and dry did seem to bring out some clarity though, and it became clear looking at the bivariate plots that factors that drive higher quality for sweet wines are certainly not exactly the same as those for dry wines. Alcohol and density remain the strongest drivers of quality for both (but at different levels), but more secondary factors (pH, residual.sugar) clearly vary between the two types.
It seems clear to me that modelling of this data (for quality) is likely to be much more effective if done separately for sweet vs. dry wines. And it could therefore be worth investigating in more detail how to improve this division (perhaps a combination of factors, rather than residual.sugar alone?).